Overview

Dataset statistics

Number of variables24
Number of observations65628
Missing cells4
Missing cells (%)< 0.1%
Duplicate rows7399
Duplicate rows (%)11.3%
Total size in memory12.0 MiB
Average record size in memory192.0 B

Variable types

Unsupported1
Categorical12
Numeric11

Alerts

targetRelease has constant value "AIR" Constant
CONTINENT has constant value "EUROPE" Constant
Dataset has 7399 (11.3%) duplicate rowsDuplicates
EPRTRAnnexIMainActivityLabel has a high cardinality: 71 distinct values High cardinality
FacilityInspireID has a high cardinality: 7185 distinct values High cardinality
facilityName has a high cardinality: 7930 distinct values High cardinality
City has a high cardinality: 5136 distinct values High cardinality
REPORTER NAME has a high cardinality: 45016 distinct values High cardinality
CITY ID has a high cardinality: 5136 distinct values High cardinality
EPRTRAnnexIMainActivityCode has a high cardinality: 70 distinct values High cardinality
max_wind_speed is highly correlated with avg_wind_speed and 1 other fieldsHigh correlation
avg_wind_speed is highly correlated with max_wind_speed and 1 other fieldsHigh correlation
min_wind_speed is highly correlated with max_wind_speed and 1 other fieldsHigh correlation
max_temp is highly correlated with avg_temp and 1 other fieldsHigh correlation
avg_temp is highly correlated with max_temp and 1 other fieldsHigh correlation
min_temp is highly correlated with max_temp and 1 other fieldsHigh correlation
max_wind_speed is highly correlated with avg_wind_speed and 1 other fieldsHigh correlation
avg_wind_speed is highly correlated with max_wind_speed and 1 other fieldsHigh correlation
min_wind_speed is highly correlated with max_wind_speed and 1 other fieldsHigh correlation
max_temp is highly correlated with avg_temp and 1 other fieldsHigh correlation
avg_temp is highly correlated with max_temp and 1 other fieldsHigh correlation
min_temp is highly correlated with max_temp and 1 other fieldsHigh correlation
max_wind_speed is highly correlated with avg_wind_speedHigh correlation
avg_wind_speed is highly correlated with max_wind_speed and 1 other fieldsHigh correlation
min_wind_speed is highly correlated with avg_wind_speedHigh correlation
max_temp is highly correlated with avg_temp and 1 other fieldsHigh correlation
avg_temp is highly correlated with max_temp and 1 other fieldsHigh correlation
min_temp is highly correlated with max_temp and 1 other fieldsHigh correlation
EPRTRAnnexIMainActivityCode is highly correlated with pollutant and 4 other fieldsHigh correlation
pollutant is highly correlated with EPRTRAnnexIMainActivityCode and 3 other fieldsHigh correlation
CONTINENT is highly correlated with EPRTRAnnexIMainActivityCode and 5 other fieldsHigh correlation
countryName is highly correlated with CONTINENT and 1 other fieldsHigh correlation
targetRelease is highly correlated with EPRTRAnnexIMainActivityCode and 5 other fieldsHigh correlation
EPRTRAnnexIMainActivityLabel is highly correlated with EPRTRAnnexIMainActivityCode and 4 other fieldsHigh correlation
eprtrSectorName is highly correlated with EPRTRAnnexIMainActivityCode and 3 other fieldsHigh correlation
countryName is highly correlated with EPRTRAnnexIMainActivityLabel and 2 other fieldsHigh correlation
eprtrSectorName is highly correlated with EPRTRAnnexIMainActivityLabel and 3 other fieldsHigh correlation
EPRTRAnnexIMainActivityLabel is highly correlated with countryName and 4 other fieldsHigh correlation
pollutant is highly correlated with eprtrSectorName and 3 other fieldsHigh correlation
MONTH is highly correlated with max_temp and 2 other fieldsHigh correlation
max_wind_speed is highly correlated with avg_wind_speed and 1 other fieldsHigh correlation
avg_wind_speed is highly correlated with max_wind_speed and 1 other fieldsHigh correlation
min_wind_speed is highly correlated with max_wind_speed and 1 other fieldsHigh correlation
max_temp is highly correlated with MONTH and 2 other fieldsHigh correlation
avg_temp is highly correlated with MONTH and 2 other fieldsHigh correlation
min_temp is highly correlated with MONTH and 2 other fieldsHigh correlation
DAY WITH FOGS is highly correlated with countryNameHigh correlation
EPRTRAnnexIMainActivityCode is highly correlated with countryName and 4 other fieldsHigh correlation
EPRTRSectorCode is highly correlated with eprtrSectorName and 3 other fieldsHigh correlation
REPORTER NAME is uniformly distributed Uniform
df_index is an unsupported type, check if it needs cleaning or further analysis Unsupported
DAY WITH FOGS has 18771 (28.6%) zeros Zeros

Reproduction

Analysis started2022-05-21 11:42:05.088624
Analysis finished2022-05-21 11:42:53.798946
Duration48.71 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

df_index
Unsupported

REJECTED
UNSUPPORTED

Missing0
Missing (%)0.0%
Memory size512.8 KiB

countryName
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct32
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size512.8 KiB
United Kingdom
9016 
Germany
8768 
France
7365 
Spain
7017 
Italy
6280 
Other values (27)
27182 

Length

Max length14
Median length11
Mean length7.585222771
Min length5

Characters and Unicode

Total characters497803
Distinct characters41
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowGermany
2nd rowItaly
3rd rowSpain
4th rowCzechia
5th rowFinland

Common Values

ValueCountFrequency (%)
United Kingdom9016
13.7%
Germany8768
13.4%
France7365
11.2%
Spain7017
10.7%
Italy6280
9.6%
Poland4252
 
6.5%
Netherlands2347
 
3.6%
Finland2271
 
3.5%
Sweden2091
 
3.2%
Belgium1875
 
2.9%
Other values (22)14346
21.9%

Length

2022-05-21T13:42:53.979128image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
united9016
12.1%
kingdom9016
12.1%
germany8768
11.7%
france7365
9.9%
spain7017
9.4%
italy6280
 
8.4%
poland4252
 
5.7%
netherlands2347
 
3.1%
finland2271
 
3.0%
sweden2091
 
2.8%
Other values (23)16221
21.7%

Most occurring characters

ValueCountFrequency (%)
n61582
 
12.4%
a54544
 
11.0%
e44796
 
9.0%
i37527
 
7.5%
d31510
 
6.3%
r27901
 
5.6%
m22719
 
4.6%
l22193
 
4.5%
t21920
 
4.4%
o17878
 
3.6%
Other values (31)155233
31.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter414143
83.2%
Uppercase Letter74644
 
15.0%
Space Separator9016
 
1.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n61582
14.9%
a54544
13.2%
e44796
10.8%
i37527
9.1%
d31510
7.6%
r27901
 
6.7%
m22719
 
5.5%
l22193
 
5.4%
t21920
 
5.3%
o17878
 
4.3%
Other values (13)71573
17.3%
Uppercase Letter
ValueCountFrequency (%)
S10815
14.5%
G9705
13.0%
F9636
12.9%
U9016
12.1%
K9016
12.1%
I7706
10.3%
P5465
7.3%
B2724
 
3.6%
N2711
 
3.6%
C2116
 
2.8%
Other values (7)5734
7.7%
Space Separator
ValueCountFrequency (%)
9016
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin488787
98.2%
Common9016
 
1.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
n61582
12.6%
a54544
 
11.2%
e44796
 
9.2%
i37527
 
7.7%
d31510
 
6.4%
r27901
 
5.7%
m22719
 
4.6%
l22193
 
4.5%
t21920
 
4.5%
o17878
 
3.7%
Other values (30)146217
29.9%
Common
ValueCountFrequency (%)
9016
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII497803
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n61582
 
12.4%
a54544
 
11.0%
e44796
 
9.0%
i37527
 
7.5%
d31510
 
6.3%
r27901
 
5.6%
m22719
 
4.6%
l22193
 
4.5%
t21920
 
4.4%
o17878
 
3.6%
Other values (31)155233
31.2%

eprtrSectorName
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct9
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size512.8 KiB
Energy sector
24562 
Waste and wastewater management
15889 
Mineral industry
10188 
Chemical industry
4334 
Paper and wood production and processing
3817 
Other values (4)
6838 

Length

Max length63
Median length46
Mean length22.79850064
Min length13

Characters and Unicode

Total characters1496220
Distinct characters32
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMineral industry
2nd rowMineral industry
3rd rowWaste and wastewater management
4th rowEnergy sector
5th rowWaste and wastewater management

Common Values

ValueCountFrequency (%)
Energy sector24562
37.4%
Waste and wastewater management15889
24.2%
Mineral industry10188
15.5%
Chemical industry4334
 
6.6%
Paper and wood production and processing3817
 
5.8%
Production and processing of metals3154
 
4.8%
Intensive livestock production and aquaculture2144
 
3.3%
Animal and vegetable products from the food and beverage sector1305
 
2.0%
Other activities235
 
0.4%

Length

2022-05-21T13:42:54.256400image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-21T13:42:54.618087image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
ValueCountFrequency (%)
and31431
15.4%
sector25867
12.6%
energy24562
12.0%
waste15889
7.8%
wastewater15889
7.8%
management15889
7.8%
industry14522
7.1%
mineral10188
 
5.0%
production9115
 
4.5%
processing6971
 
3.4%
Other values (17)34313
16.8%

Most occurring characters

ValueCountFrequency (%)
e176519
11.8%
a140807
 
9.4%
139008
 
9.3%
n134160
 
9.0%
t127266
 
8.5%
r117225
 
7.8%
s95091
 
6.4%
o69220
 
4.6%
d61495
 
4.1%
c52115
 
3.5%
Other values (22)383314
25.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1291584
86.3%
Space Separator139008
 
9.3%
Uppercase Letter65628
 
4.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e176519
13.7%
a140807
10.9%
n134160
10.4%
t127266
9.9%
r117225
9.1%
s95091
 
7.4%
o69220
 
5.4%
d61495
 
4.8%
c52115
 
4.0%
i51428
 
4.0%
Other values (13)266258
20.6%
Uppercase Letter
ValueCountFrequency (%)
E24562
37.4%
W15889
24.2%
M10188
15.5%
P6971
 
10.6%
C4334
 
6.6%
I2144
 
3.3%
A1305
 
2.0%
O235
 
0.4%
Space Separator
ValueCountFrequency (%)
139008
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1357212
90.7%
Common139008
 
9.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e176519
13.0%
a140807
10.4%
n134160
9.9%
t127266
 
9.4%
r117225
 
8.6%
s95091
 
7.0%
o69220
 
5.1%
d61495
 
4.5%
c52115
 
3.8%
i51428
 
3.8%
Other values (21)331886
24.5%
Common
ValueCountFrequency (%)
139008
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1496220
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e176519
11.8%
a140807
 
9.4%
139008
 
9.3%
n134160
 
9.0%
t127266
 
8.5%
r117225
 
7.8%
s95091
 
6.4%
o69220
 
4.6%
d61495
 
4.1%
c52115
 
3.5%
Other values (22)383314
25.6%

EPRTRAnnexIMainActivityLabel
Categorical

HIGH CARDINALITY
HIGH CORRELATION
HIGH CORRELATION

Distinct71
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size512.8 KiB
Thermal power stations and other combustion installations
21527 
Landfills (excluding landfills of inert waste and landfills, which were definitely closed before 16.7.2001 or for which the after-care phase required by the competent authorities according to Article 13 of Council Directive 1999/31/EC of 26 April 1999 on the landfill of waste has expired)
10452 
Installations for the incineration of non-hazardous waste in the scope of Directive 2000/76/EC of the European Parliament and of the Council of 4 December 2000 on the incineration of waste
3454 
Installations for the production of cement clinker in rotary kilns
3300 
Installations for the manufacture of glass, including glass fibre
2725 
Other values (66)
24170 

Length

Max length289
Median length276
Mean length126.407768
Min length10

Characters and Unicode

Total characters8295889
Distinct characters62
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowInstallations for the production of cement clinker in rotary kilns
2nd rowInstallations for the production of cement clinker in rotary kilns, lime in rotary kilns, cement or lime in other furnaces. Note to reporters, use Level 3 activity e.g. 3(c)(i), in preference to 3(c). Level 2 activity class (i.e. 3(c)) only to be used where Level 3 is not available.
3rd rowLandfills (excluding landfills of inert waste and landfills, which were definitely closed before 16.7.2001 or for which the after-care phase required by the competent authorities according to Article 13 of Council Directive 1999/31/EC of 26 April 1999 on the landfill of waste has expired)
4th rowThermal power stations and other combustion installations
5th rowUrban waste-water treatment plants

Common Values

ValueCountFrequency (%)
Thermal power stations and other combustion installations21527
32.8%
Landfills (excluding landfills of inert waste and landfills, which were definitely closed before 16.7.2001 or for which the after-care phase required by the competent authorities according to Article 13 of Council Directive 1999/31/EC of 26 April 1999 on the landfill of waste has expired)10452
15.9%
Installations for the incineration of non-hazardous waste in the scope of Directive 2000/76/EC of the European Parliament and of the Council of 4 December 2000 on the incineration of waste3454
 
5.3%
Installations for the production of cement clinker in rotary kilns3300
 
5.0%
Installations for the manufacture of glass, including glass fibre2725
 
4.2%
Mineral oil and gas refineries2454
 
3.7%
Industrial plants for the production of paper and board and other primary wood products (such as chipboard, fibreboard and plywood)2416
 
3.7%
Installations for the production of cement clinker in rotary kilns, lime in rotary kilns, cement or lime in other furnaces. Note to reporters, use Level 3 activity e.g. 3(c)(i), in preference to 3(c). Level 2 activity class (i.e. 3(c)) only to be used where Level 3 is not available.1519
 
2.3%
Installations for the production of pig iron or steel (primary or secondary melting) including continuous casting1461
 
2.2%
Industrial plants for the production of pulp from timber or similar fibrous materials1395
 
2.1%
Other values (61)14925
22.7%

Length

2022-05-21T13:42:54.965784image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
of88111
 
7.2%
the73814
 
6.0%
and49729
 
4.1%
installations44851
 
3.7%
for41088
 
3.3%
landfills31356
 
2.6%
waste29457
 
2.4%
other26424
 
2.2%
or23527
 
1.9%
to22347
 
1.8%
Other values (348)796131
64.9%

Most occurring characters

ValueCountFrequency (%)
1161207
14.0%
e657681
 
7.9%
o579699
 
7.0%
i573175
 
6.9%
t542074
 
6.5%
n506849
 
6.1%
a505255
 
6.1%
r472130
 
5.7%
l420141
 
5.1%
s411562
 
5.0%
Other values (52)2466116
29.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter6458304
77.8%
Space Separator1161207
 
14.0%
Decimal Number291000
 
3.5%
Uppercase Letter170941
 
2.1%
Other Punctuation122715
 
1.5%
Open Punctuation37186
 
0.4%
Close Punctuation37186
 
0.4%
Dash Punctuation17350
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e657681
10.2%
o579699
 
9.0%
i573175
 
8.9%
t542074
 
8.4%
n506849
 
7.8%
a505255
 
7.8%
r472130
 
7.3%
l420141
 
6.5%
s411562
 
6.4%
c288178
 
4.5%
Other values (16)1501560
23.3%
Uppercase Letter
ValueCountFrequency (%)
C32279
18.9%
I23523
13.8%
T22803
13.3%
L22347
13.1%
A20995
12.3%
D17433
10.2%
E17360
10.2%
N4404
 
2.6%
P3491
 
2.0%
M2642
 
1.5%
Other values (8)3664
 
2.1%
Decimal Number
ValueCountFrequency (%)
162805
21.6%
962712
21.6%
047668
16.4%
334627
11.9%
233896
11.6%
624448
 
8.4%
715500
 
5.3%
47960
 
2.7%
81017
 
0.3%
5367
 
0.1%
Other Punctuation
ValueCountFrequency (%)
.48909
39.9%
,43554
35.5%
/28017
22.8%
:2235
 
1.8%
Space Separator
ValueCountFrequency (%)
1161207
100.0%
Open Punctuation
ValueCountFrequency (%)
(37186
100.0%
Close Punctuation
ValueCountFrequency (%)
)37186
100.0%
Dash Punctuation
ValueCountFrequency (%)
-17350
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin6629245
79.9%
Common1666644
 
20.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e657681
 
9.9%
o579699
 
8.7%
i573175
 
8.6%
t542074
 
8.2%
n506849
 
7.6%
a505255
 
7.6%
r472130
 
7.1%
l420141
 
6.3%
s411562
 
6.2%
c288178
 
4.3%
Other values (34)1672501
25.2%
Common
ValueCountFrequency (%)
1161207
69.7%
162805
 
3.8%
962712
 
3.8%
.48909
 
2.9%
047668
 
2.9%
,43554
 
2.6%
(37186
 
2.2%
)37186
 
2.2%
334627
 
2.1%
233896
 
2.0%
Other values (8)96894
 
5.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII8295889
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1161207
14.0%
e657681
 
7.9%
o579699
 
7.0%
i573175
 
6.9%
t542074
 
6.5%
n506849
 
6.1%
a505255
 
6.1%
r472130
 
5.7%
l420141
 
5.1%
s411562
 
5.0%
Other values (52)2466116
29.7%

FacilityInspireID
Categorical

HIGH CARDINALITY

Distinct7185
Distinct (%)10.9%
Missing0
Missing (%)0.0%
Memory size512.8 KiB
https://data.ied_registry.omgeving.vlaanderen.be/id/productionfacility//BE.VL.000000067.FACILITY
 
42
UK.CAED/BEISOffsh-Foinaven-FPSO.FACILITY
 
41
ES.CAED/003486000.FACILITY
 
38
UK.CAED/BEISOffsh-Alba-Northern.FACILITY
 
38
UK.CAED/BEISOffsh-Nelson.FACILITY
 
38
Other values (7180)
65431 

Length

Max length96
Median length85
Mean length33.13148351
Min length17

Characters and Unicode

Total characters2174353
Distinct characters67
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1033 ?
Unique (%)1.6%

Sample

1st rowhttps://registry.gdi-de.org/id/de.ni.mu/06221720040
2nd rowIT.CAED/240602021.FACILITY
3rd rowES.CAED/001966000.FACILITY
4th rowCZ.MZP.U422/CZ34736841.FACILITY
5th rowhttp://paikkatiedot.fi/so/1002031/pf/ProductionFacility/0000000928.ProductionFacility

Common Values

ValueCountFrequency (%)
https://data.ied_registry.omgeving.vlaanderen.be/id/productionfacility//BE.VL.000000067.FACILITY42
 
0.1%
UK.CAED/BEISOffsh-Foinaven-FPSO.FACILITY41
 
0.1%
ES.CAED/003486000.FACILITY38
 
0.1%
UK.CAED/BEISOffsh-Alba-Northern.FACILITY38
 
0.1%
UK.CAED/BEISOffsh-Nelson.FACILITY38
 
0.1%
NL.RIVM/000000062.FACILITY38
 
0.1%
FR.CAED/11416.FACILITY37
 
0.1%
FR.CAED/11428.FACILITY37
 
0.1%
FR.CAED/6705.FACILITY37
 
0.1%
SE.CAED/10019434.Facility36
 
0.1%
Other values (7175)65246
99.4%

Length

2022-05-21T13:42:55.351918image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
https://data.ied_registry.omgeving.vlaanderen.be/id/productionfacility//be.vl.000000067.facility42
 
0.1%
uk.caed/beisoffsh-foinaven-fpso.facility41
 
0.1%
es.caed/003486000.facility38
 
0.1%
uk.caed/beisoffsh-alba-northern.facility38
 
0.1%
uk.caed/beisoffsh-nelson.facility38
 
0.1%
nl.rivm/000000062.facility38
 
0.1%
fr.caed/11416.facility37
 
0.1%
fr.caed/11428.facility37
 
0.1%
fr.caed/6705.facility37
 
0.1%
se.caed/10019434.facility36
 
0.1%
Other values (7150)65246
99.4%

Most occurring characters

ValueCountFrequency (%)
0207699
 
9.6%
.166479
 
7.7%
I116275
 
5.3%
/115391
 
5.3%
A105738
 
4.9%
C93135
 
4.3%
E80103
 
3.7%
i71010
 
3.3%
167342
 
3.1%
F66990
 
3.1%
Other values (57)1084191
49.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter787373
36.2%
Lowercase Letter536686
24.7%
Decimal Number520271
23.9%
Other Punctuation292521
 
13.5%
Dash Punctuation31291
 
1.4%
Connector Punctuation6211
 
0.3%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
I116275
14.8%
A105738
13.4%
C93135
11.8%
E80103
10.2%
F66990
8.5%
T62224
7.9%
L60118
7.6%
Y50945
6.5%
D38767
 
4.9%
P16099
 
2.0%
Other values (17)96979
12.3%
Lowercase Letter
ValueCountFrequency (%)
i71010
13.2%
t51922
 
9.7%
e50987
 
9.5%
r44789
 
8.3%
d44178
 
8.2%
g29880
 
5.6%
p28924
 
5.4%
s28721
 
5.4%
o24364
 
4.5%
a23573
 
4.4%
Other values (15)138338
25.8%
Decimal Number
ValueCountFrequency (%)
0207699
39.9%
167342
 
12.9%
245610
 
8.8%
333167
 
6.4%
430352
 
5.8%
729060
 
5.6%
529057
 
5.6%
628379
 
5.5%
925596
 
4.9%
824009
 
4.6%
Other Punctuation
ValueCountFrequency (%)
.166479
56.9%
/115391
39.4%
:10651
 
3.6%
Dash Punctuation
ValueCountFrequency (%)
-31291
100.0%
Connector Punctuation
ValueCountFrequency (%)
_6211
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1324059
60.9%
Common850294
39.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
I116275
 
8.8%
A105738
 
8.0%
C93135
 
7.0%
E80103
 
6.0%
i71010
 
5.4%
F66990
 
5.1%
T62224
 
4.7%
L60118
 
4.5%
t51922
 
3.9%
e50987
 
3.9%
Other values (42)565557
42.7%
Common
ValueCountFrequency (%)
0207699
24.4%
.166479
19.6%
/115391
13.6%
167342
 
7.9%
245610
 
5.4%
333167
 
3.9%
-31291
 
3.7%
430352
 
3.6%
729060
 
3.4%
529057
 
3.4%
Other values (5)94846
11.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII2170642
99.8%
None3711
 
0.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0207699
 
9.6%
.166479
 
7.7%
I116275
 
5.4%
/115391
 
5.3%
A105738
 
4.9%
C93135
 
4.3%
E80103
 
3.7%
i71010
 
3.3%
167342
 
3.1%
F66990
 
3.1%
Other values (56)1080480
49.8%
None
ValueCountFrequency (%)
Ś3711
100.0%

facilityName
Categorical

HIGH CARDINALITY

Distinct7930
Distinct (%)12.1%
Missing0
Missing (%)0.0%
Memory size512.8 KiB
Enel Produzione S.p.A.
 
234
SNAM Rete Gas
 
123
A2A gencogas S.p.A.
 
112
Trans Austria Gasleitung GmbH
 
109
Versalis S.p.A.
 
102
Other values (7925)
64948 

Length

Max length152
Median length104
Mean length30.40080453
Min length3

Characters and Unicode

Total characters1995144
Distinct characters179
Distinct categories15 ?
Distinct scripts4 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1503 ?
Unique (%)2.3%

Sample

1st rowHolcim (Deutschland) GmbH Werk Höver
2nd rowStabilimento di Tavernola Bergamasca
3rd rowCOMPLEJO MEDIOAMBIENTAL DE ZURITA
4th rowElektrárny Prunéřov
5th rowTAMPEREEN VESI LIIKELAITOS, VIINIKANLAHDEN JÄTEVEDENPUHDISTAMO

Common Values

ValueCountFrequency (%)
Enel Produzione S.p.A.234
 
0.4%
SNAM Rete Gas 123
 
0.2%
A2A gencogas S.p.A.112
 
0.2%
Trans Austria Gasleitung GmbH109
 
0.2%
Versalis S.p.A.102
 
0.2%
WIEN ENERGIE GmbH84
 
0.1%
Edison S.p.A.82
 
0.1%
Enipower S.p.A. 78
 
0.1%
Eni S.p.A. 73
 
0.1%
FERROPEM70
 
0.1%
Other values (7920)64561
98.4%

Length

2022-05-21T13:42:55.735853image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
10016
 
3.5%
de9554
 
3.4%
gmbh5370
 
1.9%
s.a3717
 
1.3%
di3009
 
1.1%
landfill2806
 
1.0%
power2502
 
0.9%
sa2070
 
0.7%
site2044
 
0.7%
ag1959
 
0.7%
Other values (11130)240153
84.8%

Most occurring characters

ValueCountFrequency (%)
222354
 
11.1%
e106544
 
5.3%
a90352
 
4.5%
A82570
 
4.1%
E82479
 
4.1%
r75184
 
3.8%
i72557
 
3.6%
n72463
 
3.6%
o65985
 
3.3%
S64217
 
3.2%
Other values (169)1060439
53.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter924474
46.3%
Uppercase Letter774149
38.8%
Space Separator222354
 
11.1%
Other Punctuation47366
 
2.4%
Dash Punctuation12716
 
0.6%
Decimal Number4997
 
0.3%
Open Punctuation4420
 
0.2%
Close Punctuation4382
 
0.2%
Math Symbol83
 
< 0.1%
Final Punctuation67
 
< 0.1%
Other values (5)136
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e106544
11.5%
a90352
 
9.8%
r75184
 
8.1%
i72557
 
7.8%
n72463
 
7.8%
o65985
 
7.1%
t60070
 
6.5%
l52165
 
5.6%
s40492
 
4.4%
d32315
 
3.5%
Other values (64)256347
27.7%
Uppercase Letter
ValueCountFrequency (%)
A82570
 
10.7%
E82479
 
10.7%
S64217
 
8.3%
I52578
 
6.8%
R51225
 
6.6%
O45097
 
5.8%
C42575
 
5.5%
L41748
 
5.4%
N40857
 
5.3%
T37921
 
4.9%
Other values (59)232882
30.1%
Decimal Number
ValueCountFrequency (%)
1975
19.5%
3932
18.7%
2906
18.1%
0557
11.1%
4415
8.3%
9283
 
5.7%
5278
 
5.6%
7254
 
5.1%
6218
 
4.4%
8179
 
3.6%
Other Punctuation
ValueCountFrequency (%)
.29296
61.9%
,10595
 
22.4%
"2137
 
4.5%
/2129
 
4.5%
&1955
 
4.1%
'1155
 
2.4%
:69
 
0.1%
@21
 
< 0.1%
¿9
 
< 0.1%
Open Punctuation
ValueCountFrequency (%)
(4398
99.5%
13
 
0.3%
9
 
0.2%
Dash Punctuation
ValueCountFrequency (%)
-12644
99.4%
72
 
0.6%
Final Punctuation
ValueCountFrequency (%)
52
77.6%
15
 
22.4%
Other Letter
ValueCountFrequency (%)
º49
89.1%
ª6
 
10.9%
Modifier Symbol
ValueCountFrequency (%)
´31
67.4%
`15
32.6%
Space Separator
ValueCountFrequency (%)
222354
100.0%
Close Punctuation
ValueCountFrequency (%)
)4382
100.0%
Math Symbol
ValueCountFrequency (%)
+83
100.0%
Initial Punctuation
ValueCountFrequency (%)
28
100.0%
Connector Punctuation
ValueCountFrequency (%)
_5
100.0%
Control
ValueCountFrequency (%)
–2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1698666
85.1%
Common296466
 
14.9%
Cyrillic10
 
< 0.1%
Greek2
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e106544
 
6.3%
a90352
 
5.3%
A82570
 
4.9%
E82479
 
4.9%
r75184
 
4.4%
i72557
 
4.3%
n72463
 
4.3%
o65985
 
3.9%
S64217
 
3.8%
t60070
 
3.5%
Other values (133)926245
54.5%
Common
ValueCountFrequency (%)
222354
75.0%
.29296
 
9.9%
-12644
 
4.3%
,10595
 
3.6%
(4398
 
1.5%
)4382
 
1.5%
"2137
 
0.7%
/2129
 
0.7%
&1955
 
0.7%
'1155
 
0.4%
Other values (24)5421
 
1.8%
Cyrillic
ValueCountFrequency (%)
І10
100.0%
Greek
ValueCountFrequency (%)
Ι2
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1963710
98.4%
None31235
 
1.6%
Punctuation189
 
< 0.1%
Cyrillic10
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
222354
 
11.3%
e106544
 
5.4%
a90352
 
4.6%
A82570
 
4.2%
E82479
 
4.2%
r75184
 
3.8%
i72557
 
3.7%
n72463
 
3.7%
o65985
 
3.4%
S64217
 
3.3%
Other values (67)1029005
52.4%
None
ValueCountFrequency (%)
ł3969
 
12.7%
ä3393
 
10.9%
ó2271
 
7.3%
á2211
 
7.1%
É1567
 
5.0%
ö1385
 
4.4%
ü1364
 
4.4%
Ó1141
 
3.7%
í895
 
2.9%
Á868
 
2.8%
Other values (85)12171
39.0%
Punctuation
ValueCountFrequency (%)
72
38.1%
52
27.5%
28
 
14.8%
15
 
7.9%
13
 
6.9%
9
 
4.8%
Cyrillic
ValueCountFrequency (%)
І10
100.0%

City
Categorical

HIGH CARDINALITY

Distinct5136
Distinct (%)7.8%
Missing0
Missing (%)0.0%
Memory size512.8 KiB
--
 
1975
Antwerpen
 
341
Duisburg
 
275
Cork
 
220
Botlek Rotterdam
 
215
Other values (5131)
62602 

Length

Max length47
Median length35
Mean length9.377034193
Min length1

Characters and Unicode

Total characters615396
Distinct characters187
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique718 ?
Unique (%)1.1%

Sample

1st rowSehnde
2nd rowTAVERNOLA BERGAMASCA
3rd rowPUERTO DEL ROSARIO
4th rowKadaň
5th rowTampere

Common Values

ValueCountFrequency (%)
--1975
 
3.0%
Antwerpen341
 
0.5%
Duisburg275
 
0.4%
Cork220
 
0.3%
Botlek Rotterdam215
 
0.3%
FOS-SUR-MER159
 
0.2%
Gent156
 
0.2%
Hamburg152
 
0.2%
Berlin149
 
0.2%
Bremen149
 
0.2%
Other values (5126)61837
94.2%

Length

2022-05-21T13:42:56.086590image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2004
 
2.4%
de1615
 
2.0%
la832
 
1.0%
rotterdam518
 
0.6%
san467
 
0.6%
st361
 
0.4%
antwerpen342
 
0.4%
del289
 
0.4%
duisburg275
 
0.3%
am228
 
0.3%
Other values (5513)75518
91.6%

Most occurring characters

ValueCountFrequency (%)
e34585
 
5.6%
A32191
 
5.2%
a28510
 
4.6%
E27421
 
4.5%
n24720
 
4.0%
r24680
 
4.0%
R22567
 
3.7%
O21544
 
3.5%
S19991
 
3.2%
L19640
 
3.2%
Other values (177)359547
58.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter292753
47.6%
Lowercase Letter289560
47.1%
Space Separator17878
 
2.9%
Dash Punctuation10431
 
1.7%
Other Punctuation3271
 
0.5%
Open Punctuation579
 
0.1%
Close Punctuation579
 
0.1%
Decimal Number345
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e34585
11.9%
a28510
 
9.8%
n24720
 
8.5%
r24680
 
8.5%
o18500
 
6.4%
i18312
 
6.3%
l15698
 
5.4%
t15179
 
5.2%
s13036
 
4.5%
d10028
 
3.5%
Other values (93)86312
29.8%
Uppercase Letter
ValueCountFrequency (%)
A32191
 
11.0%
E27421
 
9.4%
R22567
 
7.7%
O21544
 
7.4%
S19991
 
6.8%
L19640
 
6.7%
N19516
 
6.7%
I18366
 
6.3%
T14019
 
4.8%
M10202
 
3.5%
Other values (56)87296
29.8%
Decimal Number
ValueCountFrequency (%)
1118
34.2%
570
20.3%
043
 
12.5%
337
 
10.7%
226
 
7.5%
819
 
5.5%
417
 
4.9%
712
 
3.5%
63
 
0.9%
Other Punctuation
ValueCountFrequency (%)
,1150
35.2%
.1089
33.3%
'664
20.3%
/367
 
11.2%
:1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
17878
100.0%
Dash Punctuation
ValueCountFrequency (%)
-10431
100.0%
Open Punctuation
ValueCountFrequency (%)
(579
100.0%
Close Punctuation
ValueCountFrequency (%)
)579
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin577369
93.8%
Common33083
 
5.4%
Cyrillic4944
 
0.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e34585
 
6.0%
A32191
 
5.6%
a28510
 
4.9%
E27421
 
4.7%
n24720
 
4.3%
r24680
 
4.3%
R22567
 
3.9%
O21544
 
3.7%
S19991
 
3.5%
L19640
 
3.4%
Other values (112)321520
55.7%
Cyrillic
ValueCountFrequency (%)
о634
 
12.8%
в435
 
8.8%
и378
 
7.6%
е370
 
7.5%
л299
 
6.0%
а289
 
5.8%
р247
 
5.0%
н238
 
4.8%
С148
 
3.0%
с142
 
2.9%
Other values (37)1764
35.7%
Common
ValueCountFrequency (%)
17878
54.0%
-10431
31.5%
,1150
 
3.5%
.1089
 
3.3%
'664
 
2.0%
(579
 
1.8%
)579
 
1.8%
/367
 
1.1%
1118
 
0.4%
570
 
0.2%
Other values (8)158
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII598060
97.2%
None12392
 
2.0%
Cyrillic4944
 
0.8%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e34585
 
5.8%
A32191
 
5.4%
a28510
 
4.8%
E27421
 
4.6%
n24720
 
4.1%
r24680
 
4.1%
R22567
 
3.8%
O21544
 
3.6%
S19991
 
3.3%
L19640
 
3.3%
Other values (60)342211
57.2%
None
ValueCountFrequency (%)
ü1264
 
10.2%
ó1016
 
8.2%
á835
 
6.7%
ö781
 
6.3%
ä746
 
6.0%
ł670
 
5.4%
í634
 
5.1%
Ö607
 
4.9%
Ä409
 
3.3%
ę361
 
2.9%
Other values (60)5069
40.9%
Cyrillic
ValueCountFrequency (%)
о634
 
12.8%
в435
 
8.8%
и378
 
7.6%
е370
 
7.5%
л299
 
6.0%
а289
 
5.8%
р247
 
5.0%
н238
 
4.8%
С148
 
3.0%
с142
 
2.9%
Other values (37)1764
35.7%

targetRelease
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size512.8 KiB
AIR
65628 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters196884
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAIR
2nd rowAIR
3rd rowAIR
4th rowAIR
5th rowAIR

Common Values

ValueCountFrequency (%)
AIR65628
100.0%

Length

2022-05-21T13:42:56.361479image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-21T13:42:56.598816image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
ValueCountFrequency (%)
air65628
100.0%

Most occurring characters

ValueCountFrequency (%)
A65628
33.3%
I65628
33.3%
R65628
33.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter196884
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A65628
33.3%
I65628
33.3%
R65628
33.3%

Most occurring scripts

ValueCountFrequency (%)
Latin196884
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A65628
33.3%
I65628
33.3%
R65628
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII196884
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A65628
33.3%
I65628
33.3%
R65628
33.3%

pollutant
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size512.8 KiB
Nitrogen oxides (NOX)
25982 
Carbon dioxide (CO2)
22964 
Methane (CH4)
16682 

Length

Max length21
Median length20
Mean length18.6165661
Min length13

Characters and Unicode

Total characters1221768
Distinct characters24
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCarbon dioxide (CO2)
2nd rowNitrogen oxides (NOX)
3rd rowMethane (CH4)
4th rowNitrogen oxides (NOX)
5th rowMethane (CH4)

Common Values

ValueCountFrequency (%)
Nitrogen oxides (NOX)25982
39.6%
Carbon dioxide (CO2)22964
35.0%
Methane (CH4)16682
25.4%

Length

2022-05-21T13:42:56.814298image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-21T13:42:57.075123image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
ValueCountFrequency (%)
nitrogen25982
14.4%
oxides25982
14.4%
nox25982
14.4%
carbon22964
12.7%
dioxide22964
12.7%
co222964
12.7%
methane16682
9.3%
ch416682
9.3%

Most occurring characters

ValueCountFrequency (%)
114574
 
9.4%
e108292
 
8.9%
o97892
 
8.0%
i97892
 
8.0%
d71910
 
5.9%
(65628
 
5.4%
)65628
 
5.4%
n65628
 
5.4%
C62610
 
5.1%
N51964
 
4.3%
Other values (14)419750
34.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter713426
58.4%
Uppercase Letter222866
 
18.2%
Space Separator114574
 
9.4%
Open Punctuation65628
 
5.4%
Close Punctuation65628
 
5.4%
Decimal Number39646
 
3.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e108292
15.2%
o97892
13.7%
i97892
13.7%
d71910
10.1%
n65628
9.2%
x48946
6.9%
r48946
6.9%
t42664
 
6.0%
a39646
 
5.6%
s25982
 
3.6%
Other values (3)65628
9.2%
Uppercase Letter
ValueCountFrequency (%)
C62610
28.1%
N51964
23.3%
O48946
22.0%
X25982
11.7%
M16682
 
7.5%
H16682
 
7.5%
Decimal Number
ValueCountFrequency (%)
222964
57.9%
416682
42.1%
Space Separator
ValueCountFrequency (%)
114574
100.0%
Open Punctuation
ValueCountFrequency (%)
(65628
100.0%
Close Punctuation
ValueCountFrequency (%)
)65628
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin936292
76.6%
Common285476
 
23.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
e108292
11.6%
o97892
10.5%
i97892
10.5%
d71910
 
7.7%
n65628
 
7.0%
C62610
 
6.7%
N51964
 
5.5%
x48946
 
5.2%
O48946
 
5.2%
r48946
 
5.2%
Other values (9)233266
24.9%
Common
ValueCountFrequency (%)
114574
40.1%
(65628
23.0%
)65628
23.0%
222964
 
8.0%
416682
 
5.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII1221768
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
114574
 
9.4%
e108292
 
8.9%
o97892
 
8.0%
i97892
 
8.0%
d71910
 
5.9%
(65628
 
5.4%
)65628
 
5.4%
n65628
 
5.4%
C62610
 
5.1%
N51964
 
4.3%
Other values (14)419750
34.4%

reportingYear
Real number (ℝ≥0)

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2012.935043
Minimum2007
Maximum2020
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size512.8 KiB
2022-05-21T13:42:57.228768image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum2007
5-th percentile2007
Q12010
median2013
Q32016
95-th percentile2019
Maximum2020
Range13
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.85365506
Coefficient of variation (CV)0.001914445811
Kurtosis-1.138349474
Mean2012.935043
Median Absolute Deviation (MAD)3
Skewness0.1234551266
Sum132104901
Variance14.85065732
MonotonicityNot monotonic
2022-05-21T13:42:57.500515image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
20085361
 
8.2%
20105327
 
8.1%
20075266
 
8.0%
20095233
 
8.0%
20115151
 
7.8%
20125088
 
7.8%
20135073
 
7.7%
20144911
 
7.5%
20154706
 
7.2%
20164685
 
7.1%
Other values (4)14827
22.6%
ValueCountFrequency (%)
20075266
8.0%
20085361
8.2%
20095233
8.0%
20105327
8.1%
20115151
7.8%
20125088
7.8%
20135073
7.7%
20144911
7.5%
20154706
7.2%
20164685
7.1%
ValueCountFrequency (%)
20202408
3.7%
20193771
5.7%
20183989
6.1%
20174659
7.1%
20164685
7.1%
20154706
7.2%
20144911
7.5%
20135073
7.7%
20125088
7.8%
20115151
7.8%

MONTH
Real number (ℝ≥0)

HIGH CORRELATION

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.489973792
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size512.8 KiB
2022-05-21T13:42:57.732617image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median7
Q39
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.450832526
Coefficient of variation (CV)0.5317174825
Kurtosis-1.217502713
Mean6.489973792
Median Absolute Deviation (MAD)3
Skewness-0.001143937773
Sum425924
Variance11.90824512
MonotonicityNot monotonic
2022-05-21T13:42:57.961687image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
85575
8.5%
95570
8.5%
25530
8.4%
45498
8.4%
15498
8.4%
75480
8.4%
35479
8.3%
105453
8.3%
125412
8.2%
115387
8.2%
Other values (2)10746
16.4%
ValueCountFrequency (%)
15498
8.4%
25530
8.4%
35479
8.3%
45498
8.4%
55360
8.2%
65386
8.2%
75480
8.4%
85575
8.5%
95570
8.5%
105453
8.3%
ValueCountFrequency (%)
125412
8.2%
115387
8.2%
105453
8.3%
95570
8.5%
85575
8.5%
75480
8.4%
65386
8.2%
55360
8.2%
45498
8.4%
35479
8.3%

DAY
Real number (ℝ≥0)

Distinct28
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean14.51720302
Minimum1
Maximum28
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size512.8 KiB
2022-05-21T13:42:58.187705image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q18
median14
Q322
95-th percentile27
Maximum28
Range27
Interquartile range (IQR)14

Descriptive statistics

Standard deviation8.097332136
Coefficient of variation (CV)0.5577749463
Kurtosis-1.208998231
Mean14.51720302
Median Absolute Deviation (MAD)7
Skewness-0.006795525997
Sum952735
Variance65.56678772
MonotonicityNot monotonic
2022-05-21T13:42:58.416301image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=28)
ValueCountFrequency (%)
232472
 
3.8%
252459
 
3.7%
132448
 
3.7%
12443
 
3.7%
222431
 
3.7%
112427
 
3.7%
92396
 
3.7%
82378
 
3.6%
22372
 
3.6%
242366
 
3.6%
Other values (18)41436
63.1%
ValueCountFrequency (%)
12443
3.7%
22372
3.6%
32338
3.6%
42286
3.5%
52281
3.5%
62316
3.5%
72259
3.4%
82378
3.6%
92396
3.7%
102323
3.5%
ValueCountFrequency (%)
282323
3.5%
272330
3.6%
262312
3.5%
252459
3.7%
242366
3.6%
232472
3.8%
222431
3.7%
212303
3.5%
202364
3.6%
192245
3.4%

CONTINENT
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size512.8 KiB
EUROPE
65628 

Length

Max length6
Median length6
Mean length6
Min length6

Characters and Unicode

Total characters393768
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowEUROPE
2nd rowEUROPE
3rd rowEUROPE
4th rowEUROPE
5th rowEUROPE

Common Values

ValueCountFrequency (%)
EUROPE65628
100.0%

Length

2022-05-21T13:42:58.621945image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-21T13:42:58.860850image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
ValueCountFrequency (%)
europe65628
100.0%

Most occurring characters

ValueCountFrequency (%)
E131256
33.3%
U65628
16.7%
R65628
16.7%
O65628
16.7%
P65628
16.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter393768
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E131256
33.3%
U65628
16.7%
R65628
16.7%
O65628
16.7%
P65628
16.7%

Most occurring scripts

ValueCountFrequency (%)
Latin393768
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
E131256
33.3%
U65628
16.7%
R65628
16.7%
O65628
16.7%
P65628
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII393768
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E131256
33.3%
U65628
16.7%
R65628
16.7%
O65628
16.7%
P65628
16.7%

max_wind_speed
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct57060
Distinct (%)86.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.51595781
Minimum8.011957526
Maximum22.99138212
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size512.8 KiB
2022-05-21T13:42:59.042748image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum8.011957526
5-th percentile10.36647333
Q113.32416598
median15.50682018
Q317.71820071
95-th percentile20.66158275
Maximum22.99138212
Range14.97942459
Interquartile range (IQR)4.394034725

Descriptive statistics

Standard deviation3.067272183
Coefficient of variation (CV)0.1976850041
Kurtosis-0.5914420931
Mean15.51595781
Median Absolute Deviation (MAD)2.198767759
Skewness-0.003789813244
Sum1018281.279
Variance9.408158646
MonotonicityNot monotonic
2022-05-21T13:42:59.660291image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
17.045924715
 
< 0.1%
20.669301955
 
< 0.1%
16.613840675
 
< 0.1%
18.134709135
 
< 0.1%
15.513304095
 
< 0.1%
15.541420085
 
< 0.1%
14.767770715
 
< 0.1%
11.514244474
 
< 0.1%
16.018482144
 
< 0.1%
12.522642964
 
< 0.1%
Other values (57050)65581
99.9%
ValueCountFrequency (%)
8.0119575261
< 0.1%
8.060774021
< 0.1%
8.0626891721
< 0.1%
8.0802011441
< 0.1%
8.0956577041
< 0.1%
8.0960447781
< 0.1%
8.1058681342
< 0.1%
8.1079992031
< 0.1%
8.1374073041
< 0.1%
8.1461322551
< 0.1%
ValueCountFrequency (%)
22.991382122
< 0.1%
22.947671461
< 0.1%
22.946838651
< 0.1%
22.945513941
< 0.1%
22.941042321
< 0.1%
22.940007051
< 0.1%
22.930815991
< 0.1%
22.90957641
< 0.1%
22.901485811
< 0.1%
22.89694871
< 0.1%

avg_wind_speed
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct57060
Distinct (%)86.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean18.01528462
Minimum14.00010009
Maximum21.99997338
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size512.8 KiB
2022-05-21T13:42:59.973223image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum14.00010009
5-th percentile14.40200873
Q116.01219727
median18.02078864
Q320.01170176
95-th percentile21.6147175
Maximum21.99997338
Range7.999873293
Interquartile range (IQR)3.999504488

Descriptive statistics

Standard deviation2.310738906
Coefficient of variation (CV)0.1282654676
Kurtosis-1.197352358
Mean18.01528462
Median Absolute Deviation (MAD)1.998937055
Skewness-0.003151742333
Sum1182307.099
Variance5.339514294
MonotonicityNot monotonic
2022-05-21T13:43:00.271280image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
18.7613215
 
< 0.1%
20.994781875
 
< 0.1%
19.306613895
 
< 0.1%
18.407512365
 
< 0.1%
18.667549855
 
< 0.1%
14.643726515
 
< 0.1%
20.321975525
 
< 0.1%
14.771055864
 
< 0.1%
18.722906354
 
< 0.1%
17.4465964
 
< 0.1%
Other values (57050)65581
99.9%
ValueCountFrequency (%)
14.000100091
< 0.1%
14.000287421
< 0.1%
14.000375591
< 0.1%
14.000387131
< 0.1%
14.000398592
< 0.1%
14.000403841
< 0.1%
14.000404271
< 0.1%
14.000473281
< 0.1%
14.000633421
< 0.1%
14.000726781
< 0.1%
ValueCountFrequency (%)
21.999973381
 
< 0.1%
21.999919471
 
< 0.1%
21.999890791
 
< 0.1%
21.999874694
< 0.1%
21.999569731
 
< 0.1%
21.999455531
 
< 0.1%
21.999331351
 
< 0.1%
21.999256291
 
< 0.1%
21.998791
 
< 0.1%
21.998742932
< 0.1%

min_wind_speed
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct57060
Distinct (%)86.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean22.52103769
Minimum15.03258912
Maximum29.93360301
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size512.8 KiB
2022-05-21T13:43:00.583220image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum15.03258912
5-th percentile17.35821092
Q120.34615774
median22.54038734
Q324.71525128
95-th percentile27.60599227
Maximum29.93360301
Range14.90101389
Interquartile range (IQR)4.369093538

Descriptive statistics

Standard deviation3.059973017
Coefficient of variation (CV)0.1358717595
Kurtosis-0.5914370694
Mean22.52103769
Median Absolute Deviation (MAD)2.183472814
Skewness-0.02133438631
Sum1478010.662
Variance9.363434865
MonotonicityNot monotonic
2022-05-21T13:43:00.998777image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
19.775313415
 
< 0.1%
28.953548415
 
< 0.1%
27.264928225
 
< 0.1%
22.468790745
 
< 0.1%
20.428785475
 
< 0.1%
17.064345855
 
< 0.1%
21.450628995
 
< 0.1%
16.592223254
 
< 0.1%
25.394925344
 
< 0.1%
20.406014144
 
< 0.1%
Other values (57050)65581
99.9%
ValueCountFrequency (%)
15.032589121
< 0.1%
15.045357621
< 0.1%
15.053131441
< 0.1%
15.055647381
< 0.1%
15.05953471
< 0.1%
15.068568541
< 0.1%
15.080219392
< 0.1%
15.101682161
< 0.1%
15.116900611
< 0.1%
15.118411791
< 0.1%
ValueCountFrequency (%)
29.933603011
< 0.1%
29.925566921
< 0.1%
29.914366772
< 0.1%
29.906585941
< 0.1%
29.904341731
< 0.1%
29.898358661
< 0.1%
29.89587811
< 0.1%
29.888961061
< 0.1%
29.875862671
< 0.1%
29.869648491
< 0.1%

max_temp
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct57060
Distinct (%)86.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.455406211
Minimum-3.141463865
Maximum20.93826591
Zeros0
Zeros (%)0.0%
Negative2770
Negative (%)4.2%
Memory size512.8 KiB
2022-05-21T13:43:01.285254image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum-3.141463865
5-th percentile0.297407858
Q15.879820641
median9.698967359
Q313.28241663
95-th percentile17.72662713
Maximum20.93826591
Range24.07972977
Interquartile range (IQR)7.402595989

Descriptive statistics

Standard deviation5.21652464
Coefficient of variation (CV)0.551697571
Kurtosis-0.6679210534
Mean9.455406211
Median Absolute Deviation (MAD)3.686128408
Skewness-0.1703014119
Sum620539.3988
Variance27.21212932
MonotonicityNot monotonic
2022-05-21T13:43:01.557450image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10.219241665
 
< 0.1%
4.8843477175
 
< 0.1%
12.697031225
 
< 0.1%
1.5952685235
 
< 0.1%
6.8646781115
 
< 0.1%
5.2654241755
 
< 0.1%
2.7952457355
 
< 0.1%
8.6087412634
 
< 0.1%
11.388310784
 
< 0.1%
12.551741914
 
< 0.1%
Other values (57050)65581
99.9%
ValueCountFrequency (%)
-3.1414638652
< 0.1%
-3.0755626211
< 0.1%
-3.0715252091
< 0.1%
-3.0414524952
< 0.1%
-3.0331780121
< 0.1%
-2.9574472361
< 0.1%
-2.9470228231
< 0.1%
-2.939820011
< 0.1%
-2.9391647231
< 0.1%
-2.928912211
< 0.1%
ValueCountFrequency (%)
20.938265911
 
< 0.1%
20.926115881
 
< 0.1%
20.855884991
 
< 0.1%
20.855570111
 
< 0.1%
20.847543572
< 0.1%
20.843564553
< 0.1%
20.840629251
 
< 0.1%
20.837208763
< 0.1%
20.82433531
 
< 0.1%
20.819539631
 
< 0.1%

avg_temp
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct57060
Distinct (%)86.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.44814151
Minimum-0.1991759675
Maximum19.99940286
Zeros0
Zeros (%)0.0%
Negative387
Negative (%)0.6%
Memory size512.8 KiB
2022-05-21T13:43:01.829136image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum-0.1991759675
5-th percentile1.310303447
Q17.186013308
median10.70150351
Q314.19357768
95-th percentile18.68819061
Maximum19.99940286
Range20.19857883
Interquartile range (IQR)7.007564371

Descriptive statistics

Standard deviation5.084528995
Coefficient of variation (CV)0.486644346
Kurtosis-0.7281556885
Mean10.44814151
Median Absolute Deviation (MAD)3.502796436
Skewness-0.1871286859
Sum685690.6313
Variance25.8524351
MonotonicityNot monotonic
2022-05-21T13:43:02.068206image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
13.179671455
 
< 0.1%
7.5309735965
 
< 0.1%
12.489030485
 
< 0.1%
3.5257242435
 
< 0.1%
7.5551338455
 
< 0.1%
6.0017298485
 
< 0.1%
1.9974844395
 
< 0.1%
8.3696060794
 
< 0.1%
10.394519444
 
< 0.1%
12.541743564
 
< 0.1%
Other values (57050)65581
99.9%
ValueCountFrequency (%)
-0.19917596752
< 0.1%
-0.19865665941
< 0.1%
-0.1985739161
< 0.1%
-0.19794725681
< 0.1%
-0.19563173551
< 0.1%
-0.19502971571
< 0.1%
-0.19436259071
< 0.1%
-0.19430993031
< 0.1%
-0.1938190561
< 0.1%
-0.1937039161
< 0.1%
ValueCountFrequency (%)
19.999402862
< 0.1%
19.999342571
< 0.1%
19.998710251
< 0.1%
19.998645121
< 0.1%
19.998587871
< 0.1%
19.998520631
< 0.1%
19.998478841
< 0.1%
19.998339361
< 0.1%
19.997677011
< 0.1%
19.997626721
< 0.1%

min_temp
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct57060
Distinct (%)86.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.44282711
Minimum0.8948269078
Maximum24.902108
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size512.8 KiB
2022-05-21T13:43:02.305846image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum0.8948269078
5-th percentile4.235785504
Q19.894280935
median13.69247276
Q317.26799999
95-th percentile21.73600314
Maximum24.902108
Range24.00728109
Interquartile range (IQR)7.373719057

Descriptive statistics

Standard deviation5.216068445
Coefficient of variation (CV)0.3880187109
Kurtosis-0.6577840946
Mean13.44282711
Median Absolute Deviation (MAD)3.678393887
Skewness-0.1704416031
Sum882225.8573
Variance27.20737002
MonotonicityNot monotonic
2022-05-21T13:43:02.569047image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
14.218888065
 
< 0.1%
10.818345225
 
< 0.1%
16.001705965
 
< 0.1%
6.2429285895
 
< 0.1%
9.93160525
 
< 0.1%
7.8514819865
 
< 0.1%
5.3264213585
 
< 0.1%
11.459469654
 
< 0.1%
14.720358134
 
< 0.1%
14.747981044
 
< 0.1%
Other values (57050)65581
99.9%
ValueCountFrequency (%)
0.89482690781
< 0.1%
0.89595217821
< 0.1%
0.95077708141
< 0.1%
0.99527312051
< 0.1%
0.99992670432
< 0.1%
1.0036063931
< 0.1%
1.0098082281
< 0.1%
1.0115172011
< 0.1%
1.0219680871
< 0.1%
1.0336835231
< 0.1%
ValueCountFrequency (%)
24.9021081
< 0.1%
24.884842111
< 0.1%
24.855425061
< 0.1%
24.848932621
< 0.1%
24.84067121
< 0.1%
24.825793291
< 0.1%
24.825372931
< 0.1%
24.820613351
< 0.1%
24.817791821
< 0.1%
24.812788732
< 0.1%

DAY WITH FOGS
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct20
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.232568416
Minimum0
Maximum19
Zeros18771
Zeros (%)28.6%
Negative0
Negative (%)0.0%
Memory size512.8 KiB
2022-05-21T13:43:02.816328image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q32
95-th percentile12
Maximum19
Range19
Interquartile range (IQR)2

Descriptive statistics

Standard deviation3.778428507
Coefficient of variation (CV)1.692413312
Kurtosis7.946473712
Mean2.232568416
Median Absolute Deviation (MAD)1
Skewness2.909564205
Sum146519
Variance14.27652198
MonotonicityNot monotonic
2022-05-21T13:43:03.040370image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
119524
29.7%
219286
29.4%
018771
28.6%
11522
 
0.8%
5512
 
0.8%
18509
 
0.8%
15501
 
0.8%
4493
 
0.8%
3483
 
0.7%
12481
 
0.7%
Other values (10)4546
 
6.9%
ValueCountFrequency (%)
018771
28.6%
119524
29.7%
219286
29.4%
3483
 
0.7%
4493
 
0.8%
5512
 
0.8%
6445
 
0.7%
7434
 
0.7%
8467
 
0.7%
9478
 
0.7%
ValueCountFrequency (%)
19478
0.7%
18509
0.8%
17446
0.7%
16469
0.7%
15501
0.8%
14470
0.7%
13389
0.6%
12481
0.7%
11522
0.8%
10470
0.7%

REPORTER NAME
Categorical

HIGH CARDINALITY
UNIFORM

Distinct45016
Distinct (%)68.6%
Missing0
Missing (%)0.0%
Memory size512.8 KiB
Michael Brown
 
25
James Smith
 
25
Michael Smith
 
23
Michael Williams
 
22
Robert Jones
 
22
Other values (45011)
65511 

Length

Max length28
Median length26
Mean length13.27245993
Min length6

Characters and Unicode

Total characters871045
Distinct characters54
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique32940 ?
Unique (%)50.2%

Sample

1st rowMr. Jacob Ortega
2nd rowAshlee Serrano
3rd rowVincent Kemp
4th rowCarol Gray
5th rowBlake Ford

Common Values

ValueCountFrequency (%)
Michael Brown25
 
< 0.1%
James Smith25
 
< 0.1%
Michael Smith23
 
< 0.1%
Michael Williams22
 
< 0.1%
Robert Jones22
 
< 0.1%
Christopher Smith20
 
< 0.1%
Christopher Johnson20
 
< 0.1%
Jessica Smith19
 
< 0.1%
Jennifer Johnson19
 
< 0.1%
Jennifer Smith19
 
< 0.1%
Other values (45006)65414
99.7%

Length

2022-05-21T13:43:03.290248image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
michael1467
 
1.1%
smith1349
 
1.0%
johnson1158
 
0.9%
james1093
 
0.8%
david973
 
0.7%
john958
 
0.7%
christopher953
 
0.7%
williams916
 
0.7%
robert907
 
0.7%
jennifer905
 
0.7%
Other values (1588)123554
92.0%

Most occurring characters

ValueCountFrequency (%)
e80647
 
9.3%
a80455
 
9.2%
68605
 
7.9%
n64959
 
7.5%
r63053
 
7.2%
i52172
 
6.0%
o47478
 
5.5%
l44225
 
5.1%
s39163
 
4.5%
t30171
 
3.5%
Other values (44)300117
34.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter664657
76.3%
Uppercase Letter136382
 
15.7%
Space Separator68605
 
7.9%
Other Punctuation1401
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e80647
12.1%
a80455
12.1%
n64959
9.8%
r63053
9.5%
i52172
 
7.8%
o47478
 
7.1%
l44225
 
6.7%
s39163
 
5.9%
t30171
 
4.5%
h29258
 
4.4%
Other values (16)133076
20.0%
Uppercase Letter
ValueCountFrequency (%)
M15107
 
11.1%
J13561
 
9.9%
S11097
 
8.1%
C10168
 
7.5%
D9133
 
6.7%
R8472
 
6.2%
B8438
 
6.2%
A8390
 
6.2%
W6439
 
4.7%
H6108
 
4.5%
Other values (16)39469
28.9%
Space Separator
ValueCountFrequency (%)
68605
100.0%
Other Punctuation
ValueCountFrequency (%)
.1401
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin801039
92.0%
Common70006
 
8.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e80647
 
10.1%
a80455
 
10.0%
n64959
 
8.1%
r63053
 
7.9%
i52172
 
6.5%
o47478
 
5.9%
l44225
 
5.5%
s39163
 
4.9%
t30171
 
3.8%
h29258
 
3.7%
Other values (42)269458
33.6%
Common
ValueCountFrequency (%)
68605
98.0%
.1401
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII871045
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e80647
 
9.3%
a80455
 
9.2%
68605
 
7.9%
n64959
 
7.5%
r63053
 
7.2%
i52172
 
6.0%
o47478
 
5.5%
l44225
 
5.1%
s39163
 
4.5%
t30171
 
3.5%
Other values (44)300117
34.5%

CITY ID
Categorical

HIGH CARDINALITY

Distinct5136
Distinct (%)7.8%
Missing0
Missing (%)0.0%
Memory size512.8 KiB
cfab1ba8c67c7c838db98d666f02a132
 
1975
aed13ea855ff8b71cd5ceb869fe744c1
 
341
f53da95e5700ca1e7d12b7a833d62663
 
275
002c887b8369e59e6f58a5d06a8d0817
 
220
0759b751086c80f98aa59e11e6a115b4
 
215
Other values (5131)
62602 

Length

Max length32
Median length32
Mean length32
Min length32

Characters and Unicode

Total characters2100096
Distinct characters16
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique718 ?
Unique (%)1.1%

Sample

1st row7cdb5e74adcb2ffaa21c1b61395a984f
2nd rowcd1dbabbdba230b828c657a9b19a8963
3rd row5011e3fa1436d15b34f1287f312fbada
4th row37a6d7a71c4f7c2469e4f01b70dd90c2
5th row471fe554e1c62d1b01cc8e4e5076c61a

Common Values

ValueCountFrequency (%)
cfab1ba8c67c7c838db98d666f02a1321975
 
3.0%
aed13ea855ff8b71cd5ceb869fe744c1341
 
0.5%
f53da95e5700ca1e7d12b7a833d62663275
 
0.4%
002c887b8369e59e6f58a5d06a8d0817220
 
0.3%
0759b751086c80f98aa59e11e6a115b4215
 
0.3%
8dc3d6c792dfa6e7eb4c59921e6c635a159
 
0.2%
bc1f8a8dc753022dcebc810482590fdd156
 
0.2%
35d7df6ed3d93be2927d14acc5f1fc9a152
 
0.2%
ee1611b61f5688e70c12b40684dbb395149
 
0.2%
92c1f80a07ad537ddb7e00137d6a25f9149
 
0.2%
Other values (5126)61837
94.2%

Length

2022-05-21T13:43:03.550176image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
cfab1ba8c67c7c838db98d666f02a1321975
 
3.0%
aed13ea855ff8b71cd5ceb869fe744c1341
 
0.5%
f53da95e5700ca1e7d12b7a833d62663275
 
0.4%
002c887b8369e59e6f58a5d06a8d0817220
 
0.3%
0759b751086c80f98aa59e11e6a115b4215
 
0.3%
8dc3d6c792dfa6e7eb4c59921e6c635a159
 
0.2%
bc1f8a8dc753022dcebc810482590fdd156
 
0.2%
35d7df6ed3d93be2927d14acc5f1fc9a152
 
0.2%
92c1f80a07ad537ddb7e00137d6a25f9149
 
0.2%
ee1611b61f5688e70c12b40684dbb395149
 
0.2%
Other values (5126)61837
94.2%

Most occurring characters

ValueCountFrequency (%)
6134944
 
6.4%
d134213
 
6.4%
b133398
 
6.4%
2133360
 
6.4%
c133359
 
6.4%
0132658
 
6.3%
8132592
 
6.3%
f132408
 
6.3%
a132324
 
6.3%
7131442
 
6.3%
Other values (6)769398
36.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1306393
62.2%
Lowercase Letter793703
37.8%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
6134944
10.3%
2133360
10.2%
0132658
10.2%
8132592
10.1%
7131442
10.1%
1130328
10.0%
5129247
9.9%
3129203
9.9%
4126892
9.7%
9125727
9.6%
Lowercase Letter
ValueCountFrequency (%)
d134213
16.9%
b133398
16.8%
c133359
16.8%
f132408
16.7%
a132324
16.7%
e128001
16.1%

Most occurring scripts

ValueCountFrequency (%)
Common1306393
62.2%
Latin793703
37.8%

Most frequent character per script

Common
ValueCountFrequency (%)
6134944
10.3%
2133360
10.2%
0132658
10.2%
8132592
10.1%
7131442
10.1%
1130328
10.0%
5129247
9.9%
3129203
9.9%
4126892
9.7%
9125727
9.6%
Latin
ValueCountFrequency (%)
d134213
16.9%
b133398
16.8%
c133359
16.8%
f132408
16.7%
a132324
16.7%
e128001
16.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII2100096
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6134944
 
6.4%
d134213
 
6.4%
b133398
 
6.4%
2133360
 
6.4%
c133359
 
6.4%
0132658
 
6.3%
8132592
 
6.3%
f132408
 
6.3%
a132324
 
6.3%
7131442
 
6.3%
Other values (6)769398
36.6%

EPRTRAnnexIMainActivityCode
Categorical

HIGH CARDINALITY
HIGH CORRELATION
HIGH CORRELATION

Distinct70
Distinct (%)0.1%
Missing2
Missing (%)< 0.1%
Memory size512.8 KiB
1(c)
21527 
5(d)
10452 
5(b)
3454 
3(c)(i)
3300 
3(e)
2725 
Other values (65)
24168 

Length

Max length10
Median length4
Mean length4.551900162
Min length4

Characters and Unicode

Total characters298723
Distinct characters21
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3(c)(i)
2nd row3(c)
3rd row5(d)
4th row1(c)
5th row5(f)

Common Values

ValueCountFrequency (%)
1(c)21527
32.8%
5(d)10452
15.9%
5(b)3454
 
5.3%
3(c)(i)3300
 
5.0%
3(e)2725
 
4.2%
1(a)2454
 
3.7%
6(b)2416
 
3.7%
3(c)1519
 
2.3%
2(b)1461
 
2.2%
6(a)1395
 
2.1%
Other values (60)14923
22.7%

Length

2022-05-21T13:43:03.819183image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1(c21527
32.8%
5(d10452
15.9%
5(b3454
 
5.3%
3(c)(i3300
 
5.0%
3(e2725
 
4.2%
1(a2454
 
3.7%
6(b2416
 
3.7%
3(c1519
 
2.3%
2(b1461
 
2.2%
6(a1395
 
2.1%
Other values (60)14923
22.7%

Most occurring characters

ValueCountFrequency (%)
)75521
25.3%
(75521
25.3%
c29180
 
9.8%
124562
 
8.2%
515889
 
5.3%
i15339
 
5.1%
d10948
 
3.7%
a10545
 
3.5%
310188
 
3.4%
b10034
 
3.4%
Other values (11)20996
 
7.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter82055
27.5%
Close Punctuation75521
25.3%
Open Punctuation75521
25.3%
Decimal Number65626
22.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
c29180
35.6%
i15339
18.7%
d10948
 
13.3%
a10545
 
12.9%
b10034
 
12.2%
e3884
 
4.7%
v964
 
1.2%
f616
 
0.8%
g419
 
0.5%
x126
 
0.2%
Decimal Number
ValueCountFrequency (%)
124562
37.4%
515889
24.2%
310188
15.5%
44332
 
6.6%
63817
 
5.8%
23154
 
4.8%
72144
 
3.3%
81305
 
2.0%
9235
 
0.4%
Close Punctuation
ValueCountFrequency (%)
)75521
100.0%
Open Punctuation
ValueCountFrequency (%)
(75521
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common216668
72.5%
Latin82055
 
27.5%

Most frequent character per script

Common
ValueCountFrequency (%)
)75521
34.9%
(75521
34.9%
124562
 
11.3%
515889
 
7.3%
310188
 
4.7%
44332
 
2.0%
63817
 
1.8%
23154
 
1.5%
72144
 
1.0%
81305
 
0.6%
Latin
ValueCountFrequency (%)
c29180
35.6%
i15339
18.7%
d10948
 
13.3%
a10545
 
12.9%
b10034
 
12.2%
e3884
 
4.7%
v964
 
1.2%
f616
 
0.8%
g419
 
0.5%
x126
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII298723
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
)75521
25.3%
(75521
25.3%
c29180
 
9.8%
124562
 
8.2%
515889
 
5.3%
i15339
 
5.1%
d10948
 
3.7%
a10545
 
3.5%
310188
 
3.4%
b10034
 
3.4%
Other values (11)20996
 
7.0%

EPRTRSectorCode
Real number (ℝ≥0)

HIGH CORRELATION

Distinct9
Distinct (%)< 0.1%
Missing2
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean3.179715357
Minimum1
Maximum9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size512.8 KiB
2022-05-21T13:43:04.073978image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median3
Q35
95-th percentile7
Maximum9
Range8
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.054170901
Coefficient of variation (CV)0.6460235179
Kurtosis-0.9379714527
Mean3.179715357
Median Absolute Deviation (MAD)2
Skewness0.4153893295
Sum208672
Variance4.219618089
MonotonicityNot monotonic
2022-05-21T13:43:04.284231image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
124562
37.4%
515889
24.2%
310188
15.5%
44332
 
6.6%
63817
 
5.8%
23154
 
4.8%
72144
 
3.3%
81305
 
2.0%
9235
 
0.4%
(Missing)2
 
< 0.1%
ValueCountFrequency (%)
124562
37.4%
23154
 
4.8%
310188
15.5%
44332
 
6.6%
515889
24.2%
63817
 
5.8%
72144
 
3.3%
81305
 
2.0%
9235
 
0.4%
ValueCountFrequency (%)
9235
 
0.4%
81305
 
2.0%
72144
 
3.3%
63817
 
5.8%
515889
24.2%
44332
 
6.6%
310188
15.5%
23154
 
4.8%
124562
37.4%

Interactions

2022-05-21T13:42:48.043631image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:18.977340image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:21.900466image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:24.657743image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:27.442493image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:30.321774image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:33.197898image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:36.276374image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:39.111462image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:41.991346image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:44.800120image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:48.310104image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:19.225346image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:22.147722image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:25.090548image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:27.695359image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:30.613352image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:33.424702image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:36.526182image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:39.357308image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:42.229524image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:45.097863image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:48.589834image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:19.457115image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:22.409819image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:25.306704image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:27.939899image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:30.832451image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:33.701862image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:36.781748image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:39.641057image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:42.495873image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:45.656401image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:48.876406image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:19.717983image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:22.655794image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:25.516289image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:28.218083image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:31.066413image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:34.264128image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:37.003243image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:39.899530image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:42.737784image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:45.899520image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:49.178460image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:19.996840image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:22.902659image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:25.747662image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:28.484595image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:31.374313image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:34.515993image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:37.247565image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:40.152117image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:42.990985image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:46.156880image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:49.463734image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:20.266550image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:23.199295image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:26.007683image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:28.750379image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:31.651688image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:34.773245image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:37.514925image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:40.404933image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:43.254884image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:46.439401image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:49.736217image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:20.581679image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:23.431954image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:26.279660image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:29.005807image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:31.920010image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:35.014292image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:37.782597image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:40.656082image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:43.477770image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:46.725109image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:49.984921image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:20.827943image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:23.658826image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:26.486132image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:29.298913image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:32.161220image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:35.234146image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:38.025761image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:40.892270image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:43.683497image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:46.930430image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:50.266916image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:21.130508image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:23.898026image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:26.717616image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:29.571461image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:32.415789image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:35.498415image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:38.350602image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:41.198547image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:43.939755image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:47.204657image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:50.660231image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:21.375152image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:24.147237image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:26.926886image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:29.792994image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:32.668958image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:35.764266image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:38.583343image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:41.472165image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:44.213769image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:47.473498image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:50.948374image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:21.681864image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:24.451910image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:27.202744image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:30.090125image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:32.954888image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:36.039829image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:38.874392image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:41.752857image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:44.528789image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-05-21T13:42:47.799332image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Correlations

2022-05-21T13:43:04.469917image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-05-21T13:43:04.764274image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-05-21T13:43:05.022905image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-05-21T13:43:05.299311image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-05-21T13:43:05.594759image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-05-21T13:42:51.350761image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
A simple visualization of nullity by column.
2022-05-21T13:42:52.280681image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-05-21T13:42:52.965677image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-05-21T13:42:53.233962image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

df_indexcountryNameeprtrSectorNameEPRTRAnnexIMainActivityLabelFacilityInspireIDfacilityNameCitytargetReleasepollutantreportingYearMONTHDAYCONTINENTmax_wind_speedavg_wind_speedmin_wind_speedmax_tempavg_tempmin_tempDAY WITH FOGSREPORTER NAMECITY IDEPRTRAnnexIMainActivityCodeEPRTRSectorCode
00GermanyMineral industryInstallations for the production of cement clinker in rotary kilnshttps://registry.gdi-de.org/id/de.ni.mu/06221720040Holcim (Deutschland) GmbH Werk HöverSehndeAIRCarbon dioxide (CO2)20151020EUROPE15.11876714.31254121.4191062.8648954.9241699.6882062Mr. Jacob Ortega7cdb5e74adcb2ffaa21c1b61395a984f3(c)(i)3
11ItalyMineral industryInstallations for the production of cement clinker in rotary kilns, lime in rotary kilns, cement or lime in other furnaces. Note to reporters, use Level 3 activity e.g. 3(c)(i), in preference to 3(c). Level 2 activity class (i.e. 3(c)) only to be used where Level 3 is not available.IT.CAED/240602021.FACILITYStabilimento di Tavernola BergamascaTAVERNOLA BERGAMASCAAIRNitrogen oxides (NOX)2018921EUROPE19.66155019.36816621.7563895.4628397.86440312.0235211Ashlee Serranocd1dbabbdba230b828c657a9b19a89633(c)3
22SpainWaste and wastewater managementLandfills (excluding landfills of inert waste and landfills, which were definitely closed before 16.7.2001 or for which the after-care phase required by the competent authorities according to Article 13 of Council Directive 1999/31/EC of 26 April 1999 on the landfill of waste has expired)ES.CAED/001966000.FACILITYCOMPLEJO MEDIOAMBIENTAL DE ZURITAPUERTO DEL ROSARIOAIRMethane (CH4)201924EUROPE12.72945314.70198517.1039301.5112014.2334388.6321932Vincent Kemp5011e3fa1436d15b34f1287f312fbada5(d)5
33CzechiaEnergy sectorThermal power stations and other combustion installationsCZ.MZP.U422/CZ34736841.FACILITYElektrárny PrunéřovKadaňAIRNitrogen oxides (NOX)201286EUROPE11.85641716.12258417.53718410.97030110.29834815.1792150Carol Gray37a6d7a71c4f7c2469e4f01b70dd90c21(c)1
44FinlandWaste and wastewater managementUrban waste-water treatment plantshttp://paikkatiedot.fi/so/1002031/pf/ProductionFacility/0000000928.ProductionFacilityTAMPEREEN VESI LIIKELAITOS, VIINIKANLAHDEN JÄTEVEDENPUHDISTAMOTampereAIRMethane (CH4)20181222EUROPE17.11193020.20160421.53601211.77203911.34407816.0390042Blake Ford471fe554e1c62d1b01cc8e4e5076c61a5(f)5
55SwitzerlandEnergy sectorMineral oil and gas refineriesCH.CAED/000000011.FacilityVaro Refining Cressier SA / Raffinerie de CressierCressierAIRNitrogen oxides (NOX)20091126EUROPE13.61038416.05402118.4761850.2184631.6958303.0817572Jonathan Evans9ecac1661f9a6d2ea27ea6582db34d9f1(a)1
66FranceMineral industryInstallations for the manufacture of glass, including glass fibreFR.CAED/11626.FACILITYVERALLIACOGNACAIRCarbon dioxide (CO2)200855EUROPE12.81656915.94039721.87380710.95445313.80601416.6824821Kara Martin1eb1fba9d2767e70c428514f7299acc03(e)3
77PolandPaper and wood production and processingIndustrial plants for the production of paper and board and other primary wood products (such as chipboard, fibreboard and plywood)PL.MŚ/000000138.FACILITYArctic Paper Kostrzyn S.A.Kostrzyn nad OdrąAIRCarbon dioxide (CO2)2011411EUROPE9.14396414.17434919.87991511.91588712.93077517.6999052David Nichols90ada31eb6075ca41d9e7b23d27b15266(b)6
88United KingdomEnergy sectorThermal power stations and other combustion installationsUK.CAED/BEISOffsh-Bleo-Holm.FACILITYBleo Holm FPSO--AIRCarbon dioxide (CO2)2010620EUROPE20.76611921.20596526.2552096.3487977.87744211.13071317Frederick Chapmancfab1ba8c67c7c838db98d666f02a1321(c)1
99FranceChemical industryChemical installations for the production on an industrial scale of basic organic chemicals: Simple hydrocarbons (linear or cyclic, saturated or unsaturated, aliphatic or aromatic)FR.CAED/3839.FACILITYUSINE DE GONFREVILLEGONFREVILLE-L'ORCHERAIRCarbon dioxide (CO2)20141113EUROPE17.94922220.94789827.9920349.63308910.73642212.3050460Sheena Connerbf61dcbfc9487dc9dd63e8100d0b057e4(a)(i)4

Last rows

df_indexcountryNameeprtrSectorNameEPRTRAnnexIMainActivityLabelFacilityInspireIDfacilityNameCitytargetReleasepollutantreportingYearMONTHDAYCONTINENTmax_wind_speedavg_wind_speedmin_wind_speedmax_tempavg_tempmin_tempDAY WITH FOGSREPORTER NAMECITY IDEPRTRAnnexIMainActivityCodeEPRTRSectorCode
6561818066FranceEnergy sectorThermal power stations and other combustion installationsFR.CAED/12044.FACILITYEDF PRODUCTION ELECTRIQUE INSULAIRE - ETABLISSEMENT DE HAUTE CORSELUCCIANAAIRNitrogen oxides (NOX)2016524EUROPE12.09865017.01842323.97391817.98069719.24789322.7812760Kimberly Taylor495a606d3f1402613349b0c95d35d9311(c)1
6561942629ItalyWaste and wastewater managementLandfills (excluding landfills of inert waste and landfills, which were definitely closed before 16.7.2001 or for which the after-care phase required by the competent authorities according to Article 13 of Council Directive 1999/31/EC of 26 April 1999 on the landfill of waste has expired)IT.EEA/104.FACILITYDiscarica di Barengo (NO)BARENGOAIRMethane (CH4)2017226EUROPE18.43199820.52847826.1403988.1453848.35749110.0966470Colin Hammond7e0cee13d05d1d0ea4ca0973fcc1bf7d5(d)5
6562011953FranceMineral industryInstallations for the manufacture of glass, including glass fibreFR.CAED/10710.FACILITYARC FRANCE - SITE D'ARQUESARQUESAIRCarbon dioxide (CO2)20071217EUROPE14.32847919.97490125.6389292.7444282.6347645.2522931Madison Jackson45f325609b3242ae51996742cacb606e3(e)3
6562142042ItalyWaste and wastewater managementLandfills (excluding landfills of inert waste and landfills, which were definitely closed before 16.7.2001 or for which the after-care phase required by the competent authorities according to Article 13 of Council Directive 1999/31/EC of 26 April 1999 on the landfill of waste has expired)IT.EEA/115315.FACILITYMANDURIAMBIENTE S.p.A.MANDURIAAIRMethane (CH4)20161023EUROPE16.41231017.42104419.31772211.32108613.72942716.2321190Kimberly Scott3d508ddbc66ac3b45f01e5c7b191619e5(d)5
6562256922SerbiaChemical industryChemical installations for the production on an industrial scale of basic organic chemicals: Oxygen-containing hydrocarbons such as alcohols, aldehydes, ketones, carboxylic acids, esters, acetates, ethers, peroxides, epoxy resinsRS.SEPA.NRIZ/FACILITY.000000116MSK postrojenjeKikindaAIRNitrogen oxides (NOX)2019811EUROPE15.71970116.40820222.31166610.65043511.02268315.8258242Francisco Wilsonffdce8563b060038d08b880c452d042e4(a)(ii)4
656235147CyprusEnergy sectorThermal power stations and other combustion installationsCY.CAED/0030030000.FACILITYElectricity Authority of Cyprus, Vassilikos Power StationLARNAKAAIRCarbon dioxide (CO2)200811EUROPE13.47598818.55647622.85253013.34580112.41078317.1483270Tammy Faulkner2d4776365b33d5f1be53ea4606e2c79c1(c)1
656249442FinlandEnergy sectorThermal power stations and other combustion installationshttp://paikkatiedot.fi/so/1002031/pf/ProductionFacility/0000001728.ProductionFacilityTurun Seudun Energiantuotanto Oy, Naantalin voimalaitosNaantaliAIRNitrogen oxides (NOX)20081219EUROPE8.81593914.46170320.5537813.8202813.7638335.6571070Dr. Courtney Bryant020b11bf06b96aae1dd910a56674a8aa1(c)1
6562557189SloveniaWaste and wastewater managementLandfills (excluding landfills of inert waste and landfills, which were definitely closed before 16.7.2001 or for which the after-care phase required by the competent authorities according to Article 13 of Council Directive 1999/31/EC of 26 April 1999 on the landfill of waste has expired)SI.ARSO/000000037.FACILITYJavne službe Ptuj, Odlagališče nenevarnih odpadkov GajkePtujAIRMethane (CH4)2010810EUROPE14.79329816.68804920.41149817.28536518.34979821.5384412William Greer84afdc8367dfd9124e8b8f994e986fe95(d)5
6562640953ItalyMineral industryUnderground mining and related operationsIT.CAED/850592002.FACILITYCentro Olio Val d'AgriVIGGIANOAIRNitrogen oxides (NOX)2014125EUROPE14.91131716.14409122.6471926.3871996.1762389.2690760Leonard Roberts09ad69bcf41256f40be3314a33e0438c3(a)3
6562771260United KingdomEnergy sectorThermal power stations and other combustion installationsGB.EEA/13394.FACILITYSSE Generation Ltd, Weston Point Salt Works CHP PantRuncornAIRCarbon dioxide (CO2)2008723EUROPE21.76181221.29694929.2482768.22067811.19430814.17178013Mr. Benjamin Parkb5f44c55c14c881ea21499a32fc972d01(c)1

Duplicate rows

Most frequently occurring

countryNameeprtrSectorNameEPRTRAnnexIMainActivityLabelFacilityInspireIDfacilityNameCitytargetReleasepollutantreportingYearMONTHDAYCONTINENTmax_wind_speedavg_wind_speedmin_wind_speedmax_tempavg_tempmin_tempDAY WITH FOGSREPORTER NAMECITY IDEPRTRAnnexIMainActivityCodeEPRTRSectorCode# duplicates
212BelgiumChemical industryChemical installations for the production on an industrial scale of basic organic chemicals: Simple hydrocarbons (linear or cyclic, saturated or unsaturated, aliphatic or aromatic)https://data.ied_registry.omgeving.vlaanderen.be/id/productionfacility//BE.VL.000000067.FACILITYBASF ANTWERPENAntwerpenAIRNitrogen oxides (NOX)20171128EUROPE14.76777120.32197621.4506292.7952461.9974845.3264212Jonathan Dawsonaed13ea855ff8b71cd5ceb869fe744c14(a)(i)45
2882GermanyWaste and wastewater managementLandfills (excluding landfills of inert waste and landfills, which were definitely closed before 16.7.2001 or for which the after-care phase required by the competent authorities according to Article 13 of Council Directive 1999/31/EC of 26 April 1999 on the landfill of waste has expired)DE.EEA/43255.FACILITYKreismülldeponie Hattorf am HarzHattorfAIRMethane (CH4)20131223EUROPE20.66930220.99478228.9535484.8843487.53097410.8183450Ruth Nichols5ec132e7607c7236c6637865b567b8125(d)55
3694ItalyIntensive livestock production and aquacultureInstallations for the intensive rearing of poultry or pigs. Note to reporters, use Level 3 activity e.g. 7(a)(ii), in preference to 7(a). Level 2 activity class (i.e. 7(a)) only to be used where Level 3 is not available.IT.CAED/560402001.FACILITYTorre a Cenaia Soc. Agr. srlCRESPINA LORENZANAAIRMethane (CH4)2017910EUROPE18.13470918.40751222.4687911.5952693.5257246.2429292Jeffrey Sancheza997edd7e5658a6ca2e8c59b960e88597(a)75
5080RomaniaEnergy sectorThermal power stations and other combustion installationsRO.CAED/101VL0001.FACILITYSC CET GOVORA SARAMNICU VALCEAAIRNitrogen oxides (NOX)2009613EUROPE16.61384119.30661427.26492812.69703112.48903016.0017062Matthew Rubio DVM7443dc1372eebc031dfea6ef5b9a93441(c)15
5180RomaniaMineral industryUnderground mining and related operationsRO.CAED/105HD0001.FACILITYSCEH S.A , Sucursala Divizia Miniera S.A, Punct de lucru E.M.VulcanVulcanAIRMethane (CH4)201729EUROPE15.54142014.64372717.0643465.2654246.0017307.8514822Natasha Jonesba0bac8dc3def974576d783dea0f53843(a)35
6531United KingdomEnergy sectorThermal power stations and other combustion installationsUK.CAED/BEISOffsh-Alba-Northern.FACILITYAlba Northern--AIRCarbon dioxide (CO2)2014828EUROPE17.04592518.76132119.77531310.21924213.17967114.2188884Christopher Littlecfab1ba8c67c7c838db98d666f02a1321(c)15
7259United KingdomWaste and wastewater managementLandfills (excluding landfills of inert waste and landfills, which were definitely closed before 16.7.2001 or for which the after-care phase required by the competent authorities according to Article 13 of Council Directive 1999/31/EC of 26 April 1999 on the landfill of waste has expired)UK.CAED/EW_EA-2907.FACILITYStaple Quarry LandfillStapleAIRMethane (CH4)2012825EUROPE15.51330418.66755020.4287856.8646787.5551349.93160511Isaac Barrett2237e275b33fc04c6968575d571f9bf55(d)55
11AustriaEnergy sectorMineral oil and gas refineriesAT.CAED/9008390481905.FACILITYOMV Austria Exploration u. ProductionAderklaaAIRCarbon dioxide (CO2)201929EUROPE17.84445418.03075023.89971711.89836213.76694318.3499991Michael Perrya784bdfb9ebb719589cc5e3cbf825cac1(a)14
105AustriaPaper and wood production and processingIndustrial plants for the production of paper and board and other primary wood products (such as chipboard, fibreboard and plywood)AT.CAED/9008391215714.FACILITYW. Hamburger GmbHPittenAIRNitrogen oxides (NOX)2012228EUROPE12.91352318.03043923.69472711.29943311.75628615.7015191Kellie Carlsond7e7a3891aef3c09a84556972f1edc1e6(b)64
119AustriaProduction and processing of metalsInstallations for the production of pig iron or steel (primary or secondary melting) including continuous castingAT.EEA/5868.FACILITYvoestalpine Stahl GmbHLinzAIRNitrogen oxides (NOX)2013113EUROPE13.95028016.66329619.0520913.9113155.1768137.9867210Mark Rodriguez443befbacdaa99c161dd11495b82b99b2(b)24